A Lightweight Semantic Chunker Based on Tagging
نویسنده
چکیده
In this paper, a framework for the development of a fast, accurate, and highly portable semantic chunker is introduced. The framework is based on a non-overlapping, shallow tree-structured language. The derivation of the tree is considered as a sequence of tagging actions in a predefined linguistic context, and a novel semantic chunker is accordingly developed. It groups the phrase chunks into the arguments of a given predicate in a bottom-up fashion. This is quite different from current approaches to semantic parsing or chunking that depend on full statistical syntactic parsers that require tree bank style annotation. We compare it with a recently proposed word-byword semantic chunker and present results that show that the phrase-by-phrase approach performs better than its word-by-word counterpart.
منابع مشابه
Chunker and Shallow Parser for Free Word Order Languages: An Approach based on Valency Theory and Feature Structures
Free word order languages have relatively unrestricted local word group or phrase structures that make the problem of chunking quite challenging. On the other hand, a robust chunker can drastically reduce the complexity of a parser that follows. We present here a computational framework for chunking of free word order languages based on a generalization of the valency theory. Every word has cer...
متن کاملSemantic Role Labeling by Tagging Syntactic Chunks
In this paper, we present a semantic role labeler (or chunker) that groups syntactic chunks (i.e. base phrases) into the arguments of a predicate. This is accomplished by casting the semantic labeling as the classification of syntactic chunks (e.g. NP-chunk, PP-chunk) into one of several classes such as the beginning of an argument (B-ARG), inside an argument (I-ARG) and outside an argument (O)...
متن کاملSEIMCHA: a new semantic image CAPTCHA using geometric transformations
As protection of web applications are getting more and more important every day, CAPTCHAs are facing booming attention both by users and designers. Nowadays, it is well accepted that using visual concepts enhance security and usability of CAPTCHAs. There exist few major different ideas for designing image CAPTCHAs. Some methods apply a set of modifications such as rotations to the original imag...
متن کاملPOS Tagger and Chunker for Tamil Language
This paper presents the Part Of Speech tagger and Chunker for Tamil using Machine learning techniques. Part Of Speech tagging and chunking are the fundamental processing steps for any language processing task. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. Chunking is the task of identifying and segmenting the text...
متن کاملChinese Character-based Segmentation & POS-tagging and Named Entity Identification with a CRF Chunker
In this paper, we propose a character-based conditional random field (CRF) chunker to identify Chinese named entity words in the text files. The input for it is from a character-based tagger in which the segmentation and partof-speech (POS) tagging are conducted simultanueously. The character-based tagger is trained by using a corpus in which each character is tagged with both its position (POC...
متن کامل